Robustness and Statistical Significance of Pam-like Matrices for Cognate Identification

نویسندگان

  • Antonella Delmestri
  • Nello Cristianini
چکیده

This paper tests the influence of the training dataset dimension on a recently proposed orthographic learning system, inspired from biological sequence analysis and successfully applied to cognate identification. This system automatically aligns a given set of cognate pairs producing a meaningful training dataset, learns from it substitution parameters using a PAM-like technique and utilises them to recognise cognate pairs. The results show no difference in the performance when training the system with about 650 cognate pairs extracted from 6 Indo-European languages or with about 62,000 cognate pairs extracted from 76 Indo-European languages. In both cases the system outperforms all comparable orthographic and phonetic methods previously proposed in the literature. This paper also investigates the statistical significance of these results when compared with earlier proposals. The outcome confirms that the performance reached by this system with both training datasets is significantly higher than the one achieved by all the other methods. Indeed, the training dataset dimension seems not to influence either the accuracy or the statistical significance of this learning system that needs only a very small amount of data to reach an outstanding performance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

String Similarity Measures and PAM-like Matrices for Cognate Identification

We present a new automatic learning system for the identification of cognates, words that derive from a common ancestor and share the same etymological origin. Our approach combines and adapts several techniques developed for biological sequence analysis to the natural language processing environment. We design a linguistic-inspired matrix to align sensibly our training dataset. We introduce a ...

متن کامل

Linguistic Phylogenetic Inference by PAM-like Matrices

We apply to the task of linguistic phylogenetic inference a successful cognate identification learning model based on PAM-like matrices. We train our system and we employ the learned parameters for measuring the lexical distance between languages. We estimate phylogenetic trees using distancebased methods on an Indo-European database. Our results reproduce correctly all the established major la...

متن کامل

Application of Polyacrylamide for Splash Erosion Control on Marl Soil

Splash erosion is recognized as the first stage in the process of erosion that results in bombardment of the soil's surface with rain drops. Two basic processes in soil erosion are the dispersement of soil particles by rain drops and the changes caused to the soil's structure, which are then moved by runoff. In this research,the effect of various polyacrylamide (PAM)values (0, 0.2, 0.4 and 0.6 ...

متن کامل

Robust Fuzzy Gain-Scheduled Control of the 3-Phase IPMSM

This article presents a fuzzy robust Mixed - Sensitivity Gain - Scheduled H controller based on the Loop -Shaping methodology for a class of MIMO uncertain nonlinear Time - Varying systems. In order to design this controller, the nonlinear parameter - dependent plant is first modeled as a set of linear subsystems by Takagi and Sugeno’s (T - S) fuzzy approach. Both Loop - Shaping methodology and...

متن کامل

The Role of Individual and Contextual Characteristics in Predicting Resilience Among Child/Teens Living at Family-Like Community Centers

Introduction: Resilience, as a positive psychological construct, has gained significance in psychological research. The main goal of the current study is to identify the role of individual and contextual characteristics in predicting resilience among child/teens living at family-like community centers. Methods: This study is cross-sectional-descriptive with the correlational method. The partici...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010